Search CORE

8 research outputs found

DynamoRep: Trajectory-Based Population Dynamics for Classification of Black-box Optimization Problems

Author: Cenikj Gjorgjina
Doerr Carola
Eftimov Tome
Korošec Peter
Petelin Gašper
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 08/06/2023
Field of study

The application of machine learning (ML) models to the analysis of optimization algorithms requires the representation of optimization problems using numerical features. These features can be used as input for ML models that are trained to select or to configure a suitable algorithm for the problem at hand. Since in pure black-box optimization information about the problem instance can only be obtained through function evaluation, a common approach is to dedicate some function evaluations for feature extraction, e.g., using random sampling. This approach has two key downsides: (1) It reduces the budget left for the actual optimization phase, and (2) it neglects valuable information that could be obtained from a problem-solver interaction. In this paper, we propose a feature extraction method that describes the trajectories of optimization algorithms using simple descriptive statistics. We evaluate the generated features for the task of classifying problem classes from the Black Box Optimization Benchmarking (BBOB) suite. We demonstrate that the proposed DynamoRep features capture enough information to identify the problem class on which the optimization algorithm is running, achieving a mean classification accuracy of 95% across all experiments.Comment: 9 pages, 5 figure

arXiv.org e-Print Archive

Digital repository of Slovenian research organizations

Assessing the Generalizability of a Performance Predictive Model

Author: Cenikj Gjorgjina
Doerr Carola
Eftimov Tome
Engelbrecht Andries Petrus
Ispirova Gordana
Korošec Peter
Lang Ryan Dieter
Nikolikj Ana
Vermetten Diederick
Publication venue
Publication date: 31/05/2023
Field of study

A key component of automated algorithm selection and configuration, which in most cases are performed using supervised machine learning (ML) methods is a good-performing predictive model. The predictive model uses the feature representation of a set of problem instances as input data and predicts the algorithm performance achieved on them. Common machine learning models struggle to make predictions for instances with feature representations not covered by the training data, resulting in poor generalization to unseen problems. In this study, we propose a workflow to estimate the generalizability of a predictive model for algorithm performance, trained on one benchmark suite to another. The workflow has been tested by training predictive models across benchmark suites and the results show that generalizability patterns in the landscape feature space are reflected in the performance space.Comment: To appear at GECCO 202

arXiv.org e-Print Archive

FooDis: a food-disease relation mining pipeline

Author: Cenikj Gjorgjina
Eftimov Tome
Koroušić-Seljak Barbara
Publication venue: Elsevier
Publication date: 20/05/2023
Field of study

Nowadays, it is really important and crucial to follow the new biomedical knowledge that is presented in scientific literature. To this end, Information Extraction pipelines can help to automatically extract meaningful relations from textual data that further require additional checks by domain experts. In the last two decades, a lot of work has been performed for extracting relations between phenotype and health concepts, however, the relations with food entities which are one of the most important environmental concepts have never been explored. In this study, we propose FooDis, a novel Information Extraction pipeline that employs state-of-the-art approaches in Natural Language Processing to mine abstracts of biomedical scientific papers and automatically suggests potential cause or treat relations between food and disease entities in different existing semantic resources. A comparison with already known relations indicates that the relations predicted by our pipeline match for 90% of the food-disease pairs that are common in our results and the NutriChem database, and 93% of the common pairs in the DietRx platform. The comparison also shows that the FooDis pipeline can suggest relations with high precision. The FooDis pipeline can be further used to dynamically discover new relations between food and diseases that should be checked by domain experts and further used to populate some of the existing resources used by NutriChem and DietRx

Digital repository of Slovenian research organizations

SciFoodNER: food named entity recognition for scientific text

Author: Cenikj Gjorgjina
Eftimov Tome
Koroušić-Seljak Barbara
Petelin Gašper
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 17/06/2023
Field of study

Digital repository of Slovenian research organizations

From language models to large-scale food and biomedical knowledge graphs

Author: Angelski Risto
Cenikj Gjorgjina
Eftimov Tome
Koroušić-Seljak Barbara
Ogrinc Nives
Strojnik Lidija
Publication venue: Nature Publishing Group
Publication date: 15/05/2023
Field of study

Knowledge about the interactions between dietary and biomedical factors is scattered throughout uncountable research articles in an unstructured form (e.g., text, images, etc.) and requires automatic structuring so that it can be provided to medical professionals in a suitable format. Various biomedical knowledge graphs exist, however, they require further extension with relations between food and biomedical entities. In this study, we evaluate the performance of three state-of-the-art relation-mining pipelines (FooDis, FoodChem and ChemDis) which extract relations between food, chemical and disease entities from textual data. We perform two case studies, where relations were automatically extracted by the pipelines and validated by domain experts. The results show that the pipelines can extract relations with an average precision around 70%, making new discoveries available to domain experts with reduced human effort, since the domain experts should only evaluate the results, instead of finding, and reading all new scientific papers

Digital repository of Slovenian research organizations

Assessing the generalizability of a performance predictive model

Author: Cenikj Gjorgjina
Dieter Lang Ryan
Doerr Carola
Eftimov Tome
Engelbrecht Andries Petrus
Ispirova Gordana
Korošec Peter
Nikolikj Ana
Vermetten Diederick
Publication venue: ǂThe ǂAssociation for Computing Machinery
Publication date: 24/07/2023
Field of study

Digital repository of Slovenian research organizations

CafeteriaFCD Corpus: Food Consumption Data Annotated with Regard to Different Food Semantic Resources

Author: Barbara Koroušić Seljak
Ermanno Cavalli
Eva Valenčič
Gjorgjina Cenikj
Gordana Ispirova
Matevž Ogrinc
Peter Korošec
Riste Stojanov
Tome Eftimov
Publication venue: 'MDPI AG'
Publication date: 02/09/2022
Field of study

Besides the numerous studies in the last decade involving food and nutrition data, this domain remains low resourced. Annotated corpuses are very useful tools for researchers and experts of the domain in question, as well as for data scientists for analysis. In this paper, we present the annotation process of food consumption data (recipes) with semantic tags from different semantic resources—Hansard taxonomy, FoodOn ontology, SNOMED CT terminology and the FoodEx2 classification system. FoodBase is an annotated corpus of food entities—recipes—which includes a curated version of 1000 instances, considered a gold standard. In this study, we use the curated version of FoodBase and two different approaches for annotating—the NCBO annotator (for the FoodOn and SNOMED CT annotations) and the semi-automatic StandFood method (for the FoodEx2 annotations). The end result is a new version of the golden standard of the FoodBase corpus, called the CafeteriaFCD (Cafeteria Food Consumption Data) corpus. This corpus contains food consumption data—recipes—annotated with semantic tags from the aforementioned four different external semantic resources. With these annotations, data interoperability is achieved between five semantic resources from different domains. This resource can be further utilized for developing and training different information extraction pipelines using state-of-the-art NLP approaches for tracing knowledge about food safety applications

Multidisciplinary Digital Publishing Institute

PubMed Central

Improving Nevergrad’s Algorithm Selection Wizard NGOpt Through Automated Algorithm Configuration

Author: Cenikj Gjorgjina
Doerr Carola
Eftimov Tome
López-Ibáñez Manuel
Nikolikj Ana
Teytaud Fabien
Teytaud Olivier
Trajanov Risto
Videau Mathurin
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 14/08/2022
Field of study

International audienceAlgorithm selection wizards are effective and versatile tools that automatically select an optimization algorithm given high-level information about the problem and available computational resources, such as number and type of decision variables, maximal number of evaluations, possibility to parallelize evaluations, etc. State-of-the-art algorithm selection wizards are complex and difficult to improve. We propose in this work the use of automated configuration methods for improving their performance by finding better configurations of the algorithms that compose them. In particular, we use elitist iterated racing (irace) to find CMA configurations for specific artificial benchmarks that replace the hand-crafted CMA configurations currently used in the NGOpt wizard provided by the Nevergrad platform. We discuss in detail the setup of irace for the purpose of generating configurations that work well over the diverse set of problem instances within each benchmark. Our approach improves the performance of the NGOpt wizard, even on benchmark suites that were not part of the tuning by irace

The University of Manchester - Institutional Repository

Hal-Diderot